Overview

In this course we are going to introduce basic analysis for single-cell RNAseq, with a specific focus on the 10X system. The course is divided into multiple sessions.

In this second session, we will introduce Seurat and Bioconductor methodologies for basic scRNA-seq QC, filtering and initial analysis of single samples.

The data

For these sessions we are going to make use of two datasets.

The first set will be from the recent paper Enteroendocrine cell lineages that differentially control feeding and gut motility.
This contains scRNA data from either Neurod1 and Neurog3 expressing enteroendocrine cells.

The second dataset is the classic example from PBMC cells.

Seurat and Bioconductor

The Seurat R package and associated packages offer an R based and well established methodology for the analysis of single cell RNA-seq, ATAC-seq and many other single-cell sequencing methodologies.

Seurat page

Bioconductor has a set of interrelated and highly connected software packages for single cell RNAseq as well as integrating with the rest of the Bioconductor software ecosystem. OSCA book

In this session we will run through basic analyses using both of the methodologies to show both their equvalence in some parts of the analysis as well as their distinct functionalities in others.

In practice, you may select methods from either softwares and convert between the two as required.

The Example Data

For this session we will need the filtered and raw expression data for the Neurod1 datasets.

The filtered matrix can be found here

The raw matrix can be found here here

Bioconductor methods


Data import (Bioconductor)

First we need to load the DropletUtils package to read and handle our Droplet 10X data.

library(DropletUtils)

Read10x

We can then read in the filtered matrix containing data on droplets marked as cells by CellRanger.

We set the row.names argument for easier interpretation.

h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/filtered_feature_bc_matrix.h5"
local_h5file <- basename(h5file)
download.file(h5file, local_h5file)

sce.NeuroD1_filtered <- read10xCounts(local_h5file, col.names = TRUE, row.names = "symbol")

class(sce.NeuroD1_filtered)
## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"

SingleCellExperiment Object

The created SingleCellExperiment Object is much like our SummarizedExperiment object

sce.NeuroD1_filtered
## class: SingleCellExperiment 
## dim: 39905 3468 
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
##   TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

SingleCellExperiment Object

The colnames() and rownames() functions can be used to access column names (cell-barcodes) and row names (gene identifiers)

# cell information
colnames(sce.NeuroD1_filtered)[1:2]
## [1] "AAACCCACACAATGTC-1" "AAACCCACACCACTGG-1"
# gene information
rownames(sce.NeuroD1_filtered)[1:2]
## [1] "Gm26206" "Xkr4"

SingleCellExperiment Object

The colData() and rowData() functions can be used to access experiment metadata.

# cell information
colData(sce.NeuroD1_filtered)[1:2, ]
## DataFrame with 2 rows and 2 columns
##                                    Sample            Barcode
##                               <character>        <character>
## AAACCCACACAATGTC-1 filtered_feature_bc_.. AAACCCACACAATGTC-1
## AAACCCACACCACTGG-1 filtered_feature_bc_.. AAACCCACACCACTGG-1
# gene information
rowData(sce.NeuroD1_filtered)[1:2, ]
## DataFrame with 2 rows and 3 columns
##                  ID      Symbol            Type
##         <character> <character>     <character>
## Gm26206     Gm26206     Gm26206 Gene Expression
## Xkr4           Xkr4        Xkr4 Gene Expression

SingleCellExperiment Object

A reducedDim and reducedDimNames slots at present remains unfilled.

The metadata slot contains the Sample names. Here all these cells came from single sample

# cell information
reducedDimNames(sce.NeuroD1_filtered)
## character(0)
# gene information
metadata(sce.NeuroD1_filtered)
## $Samples
## [1] "filtered_feature_bc_matrix.h5"

The knee plot

One of the first plots we may want to recreate from the QC is the knee plot.

To do this we will need the unfiltered matrix too.

The unfiltered SingleCellExperiment contains over a million droplets.

h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/raw_feature_bc_matrix.h5"
local_h5file <- "NeuroD1_raw_feature_bc_matrix.h5"
download.file(h5file, local_h5file)
sce.NeuroD1_unfiltered <- read10xCounts(local_h5file, col.names = TRUE)
sce.NeuroD1_unfiltered
## class: SingleCellExperiment 
## dim: 39905 1244317 
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(1244317): AAACCCAAGAAACACT-1 AAACCCAAGAAACCAT-1 ...
##   TTTGTTGTCTTTGCTA-1 TTTGTTGTCTTTGTCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Barcode ranks

We can use the barcodeRanks function to create a table of barcode ranks and total reads per droplet.

bcrank <- barcodeRanks(counts(sce.NeuroD1_unfiltered))
bcrank
## DataFrame with 1244317 rows and 3 columns
##                         rank     total    fitted
##                    <numeric> <integer> <numeric>
## AAACCCAAGAAACACT-1     45958       155        NA
## AAACCCAAGAAACCAT-1    102639         4        NA
## AAACCCAAGAAACCCG-1    629672         1        NA
## AAACCCAAGAAACCTG-1    629672         1        NA
## AAACCCAAGAAAGAAC-1   1096187         0        NA
## ...                      ...       ...       ...
## TTTGTTGTCTTTGATC-1   36930.5       167        NA
## TTTGTTGTCTTTGCAT-1  629672.5         1        NA
## TTTGTTGTCTTTGCGC-1  629672.5         1        NA
## TTTGTTGTCTTTGCTA-1  629672.5         1        NA
## TTTGTTGTCTTTGTCG-1  629672.5         1        NA

Retrieve filtered barcodes

We then filter for droplets with the same rank for visualisation purposes.

bcrank$filtered <- rownames(bcrank) %in% colnames(sce.NeuroD1_filtered)
bc_plot <- as.data.frame(bcrank)
bc_plot <- bc_plot[order(bc_plot$filtered, decreasing = TRUE), ]
bc_plot <- bc_plot[!duplicated(bc_plot$rank), ]

Plotting barcode rank

We can now recreate the knee plot from CellRanger.

require(ggplot2)
ggplot(bc_plot, aes(x = rank, y = total, colour = filtered, alpha = 0.001)) + geom_point() +
    scale_y_log10() + scale_x_log10() + theme_minimal() + geom_hline(yintercept = metadata(bcrank)$inflection,
    colour = "darkgreen", linetype = 2) + geom_hline(yintercept = metadata(bcrank)$knee,
    colour = "dodgerblue", linetype = 2)
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Filtering droplets using Droplet utils

Although CellRanger provides a filtered Droplet matrix, you may wish to provide your own cut-off based on knee plots generated and/or use the emptyDrops method in Droplet utils to identify cell containing droplets.

e.out <- emptyDrops(counts(sce.NeuroD1_unfiltered))
table(e.out)
head(e.out)
## DataFrame with 6 rows and 5 columns
##                        Total   LogProb    PValue   Limited       FDR
##                    <integer> <numeric> <numeric> <logical> <numeric>
## AAACCCAAGAAACACT-1       118  -350.537   0.30007     FALSE         1
## AAACCCAAGAAACCAT-1         2        NA        NA        NA        NA
## AAACCCAAGAAACCCG-1         0        NA        NA        NA        NA
## AAACCCAAGAAACCTG-1         0        NA        NA        NA        NA
## AAACCCAAGAAAGAAC-1         0        NA        NA        NA        NA
## AAACCCAAGAAAGCGA-1         1        NA        NA        NA        NA

Bioconductor - QC


Working with CellRanger filtered matrix

From this point we will use the Droplets filtered using Cellranger

sce.NeuroD1_filtered
## class: SingleCellExperiment 
## dim: 39905 3468 
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
##   TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Add QC metrics

To assess QC we first want to identify mitochondrial and ribosomal genes.

High mitochondrial gene expression within a cell is often used as a marker of dying cells.

is.mito <- grepl("^MT", rowData(sce.NeuroD1_filtered)$Symbol)
# is.ribo <- grepl('^Rps',rowData(sce.NeuroD1_filtered)$Symbol)

table(is.mito)
## is.mito
## FALSE  TRUE 
## 39868    37
# table(is.ribo)

Add QC metrics

The scuttle package has functions for normalisation, transformation and QC.

Here we us the addPerCellQCMetrics function to update our SingleCellExperiment object with QC information.

library(scuttle)
sce.NeuroD1_filtered <- addPerCellQCMetrics(sce.NeuroD1_filtered, subsets = list(Mito = is.mito))
sce.NeuroD1_filtered
## class: SingleCellExperiment 
## dim: 39905 3468 
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
##   TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(8): Sample Barcode ... subsets_Mito_percent total
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

QC in colData

The QC data per Droplet has been added to additional columns in the colData slot of our SingleCellExperiment.

qc_df <- colData(sce.NeuroD1_filtered)
head(qc_df, n = 2)
## DataFrame with 2 rows and 8 columns
##                                    Sample            Barcode       sum  detected
##                               <character>        <character> <numeric> <integer>
## AAACCCACACAATGTC-1 filtered_feature_bc_.. AAACCCACACAATGTC-1     23320      4776
## AAACCCACACCACTGG-1 filtered_feature_bc_.. AAACCCACACCACTGG-1      1944       176
##                    subsets_Mito_sum subsets_Mito_detected subsets_Mito_percent
##                           <numeric>             <integer>            <numeric>
## AAACCCACACAATGTC-1             3358                    19              14.3997
## AAACCCACACCACTGG-1             1577                    15              81.1214
##                        total
##                    <numeric>
## AAACCCACACAATGTC-1     23320
## AAACCCACACCACTGG-1      1944

Plotting QC data

We can then plot this using the plotColData function from the Scater package.

First we plot the distritbution of total reads per cell.

library(scater)
plotColData(sce.NeuroD1_filtered, x = "Sample", y = "sum")

Plotting QC data

Then we plot the distritbution of total detected genes per cell.

library(scater)
plotColData(sce.NeuroD1_filtered, x = "Sample", y = "detected")

Plotting QC data

Then we plot the distritbution of percent of mitochondrial reads per cell.

library(scater)
plotColData(sce.NeuroD1_filtered, x = "Sample", y = "subsets_Mito_percent")

Plotting QC data

It also can be useful to capture the QC columns against each other as scatterplots

library(scater)
p1 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "detected")
p2 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "subsets_Mito_percent")
p3 <- plotColData(sce.NeuroD1_filtered, x = "detected", y = "subsets_Mito_percent")
gridExtra::grid.arrange(p1, p2, p3, ncol = 3)

Removing low quality data

At this point we may want to remove low quality data.

Typically droplet filters on mitochondrial read content, total detect genes and total read counts are applied.

qc.high_lib_size <- colData(sce.NeuroD1_filtered)$sum > 125000
qc.min_detected <- colData(sce.NeuroD1_filtered)$detected < 200
qc.mito <- colData(sce.NeuroD1_filtered)$subsets_Mito_percent > 25
discard <- qc.high_lib_size | qc.mito | qc.min_detected
DataFrame(LibSize = sum(qc.high_lib_size), Detected = sum(qc.min_detected), MitoProp = sum(qc.mito),
    Total = sum(discard))
## DataFrame with 1 row and 4 columns
##     LibSize  Detected  MitoProp     Total
##   <integer> <integer> <integer> <integer>
## 1        13       473      1767      1792

Removing low quality data

We will add this back in the colData slot to perform some final QC plots before filtering

colData(sce.NeuroD1_filtered) <- cbind(colData(sce.NeuroD1_filtered), DataFrame(toDiscard = discard))
p1 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "detected", colour_by = "toDiscard")
p2 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "subsets_Mito_percent", colour_by = "toDiscard")
p3 <- plotColData(sce.NeuroD1_filtered, x = "detected", y = "subsets_Mito_percent",
    colour_by = "toDiscard")
gridExtra::grid.arrange(p1, p2, p3, ncol = 3)

Removing low quality data

At this point we may want to remove the low quality cells from further analysis

sce.NeuroD1_filtered_QCed <- sce.NeuroD1_filtered[, sce.NeuroD1_filtered$toDiscard %in%
    "FALSE"]
sce.NeuroD1_filtered_QCed
## class: SingleCellExperiment 
## dim: 39905 1676 
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(1676): AAACCCACACAATGTC-1 AAACCCACACGAGAAC-1 ...
##   TTTGGAGTCCGTAGGC-1 TTTGTTGGTCTAACTG-1
## colData names(9): Sample Barcode ... total toDiscard
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Log Normalisation

As with our RNA-seq and genomics data we want to normalise for total library size and perfrom a log2 transformation.

sce.NeuroD1_filtered_QCed <- logNormCounts(sce.NeuroD1_filtered_QCed)
assayNames(sce.NeuroD1_filtered_QCed)
## [1] "counts"    "logcounts"

Log Normalisation 2

An alternative approach for scRNA is to normalise accounting for potentially different cell-populations.

As we dont know cell populations yet, a quick clustering of data followed by this normalisation using clusters is performed.

library(scran)
clust.sce.NeuroD1_filtered_QCed <- quickCluster(sce.NeuroD1_filtered_QCed)
sce.NeuroD1_filtered_trimmed <- computeSumFactors(sce.NeuroD1_filtered_QCed, cluster = clust.sce.NeuroD1_filtered_QCed)
sce.NeuroD1_filtered_QCed <- logNormCounts(sce.NeuroD1_filtered_QCed)
assayNames(sce.NeuroD1_filtered_QCed)
## [1] "counts"    "logcounts"

Modeling the mean-variance relationship

As with RNA-seq we want to account for mean-dispersion relationship.

In Bioconductor we can use the modelGeneVar function to perform this for us.

##
dec.NeuroD1_filtered_QCed <- modelGeneVar(sce.NeuroD1_filtered_QCed)
ggplot(dec.NeuroD1_filtered_QCed, aes(x = mean, y = total)) + geom_point()

Identifying top variable genes for PCA

Having fit the mean-dispersion relationship we can extract the most variable genes for use in PCA dimension reduction.

top.NeuroD1_filtered_QCed <- getTopHVGs(dec.NeuroD1_filtered_QCed, n = 3000)

PCA

We can now perform a PCA on our data.

The PCs will be used for clustering and 2D-representation of our data.

set.seed(100)  # See below.
sce.NeuroD1_filtered_QCed <- fixedPCA(sce.NeuroD1_filtered_QCed, subset.row = top.NeuroD1_filtered_QCed)
reducedDimNames(sce.NeuroD1_filtered_QCed)
## [1] "PCA"

PCA

We can produce a quick plot of the PCs to see if there are any biases immediately evident in the PCs.

plotReducedDim(sce.NeuroD1_filtered_QCed, dimred = "PCA", colour_by = "subsets_Mito_percent")

Select PCs from PCA

For further analysis we want to select the PCs which explain most of the data.

To do this we can plot the variance explained by each PC and identify an elbow point.

percent.var <- attr(reducedDim(sce.NeuroD1_filtered_QCed), "percentVar")
plot(percent.var, log = "y", xlab = "PC", ylab = "Variance explained (%)")

# library(PCAtools) chosen.elbow <- findElbowPoint(percent.var)

Visualise in 2D with Tisne and UMAP

With our PCs now calculated we can project these onto a 2D representation of the data. Here we will use two popular methods of TSNE and UMAP.

sce.NeuroD1_filtered_QCed <- runTSNE(sce.NeuroD1_filtered_QCed, n_dimred = 30)
sce.NeuroD1_filtered_QCed <- runUMAP(sce.NeuroD1_filtered_QCed, n_dimred = 30)
reducedDimNames(sce.NeuroD1_filtered_QCed)
## [1] "PCA"  "TSNE" "UMAP"

Visualise in 2D with Tisne and UMAP

Witht the TSNE and UMAP we may first want to overlay some features of QC to see if this affects the data still.

plotUMAP(sce.NeuroD1_filtered_QCed, colour_by = "subsets_Mito_percent")

plotTSNE(sce.NeuroD1_filtered_QCed, colour_by = "subsets_Mito_percent")

Clustering

For clustering we can use the set of tools available in the bluster package.

Here we run the default graph based methods walktrap and louvain from the igraph package.

require(bluster)
clust.louvain <- clusterCells(sce.NeuroD1_filtered_QCed, use.dimred = "PCA", BLUSPARAM = NNGraphParam(cluster.fun = "louvain",
    cluster.args = list(resolution = 0.8)))
clust.default <- clusterCells(sce.NeuroD1_filtered_QCed, use.dimred = "PCA")

Clustering

Once we have the clustering we will add them to the colData for the object and replot the UMAP

colLabels(sce.NeuroD1_filtered_QCed) <- clust.louvain
colData(sce.NeuroD1_filtered_QCed)$DefaultLabel <- clust.default
plotUMAP(sce.NeuroD1_filtered_QCed, colour_by = "label")

Clustering

We can now plot the mito percent against clusters to see if any clusters are driven by the mito percent of droplets.

plotColData(sce.NeuroD1_filtered_QCed, x = "label", y = "subsets_Mito_percent", colour_by = "label")

Clustering

We can then compare our overlaps between our clusterings in a simple heatmap.

tab <- table(Walktrap = clust.default, Louvain = clust.louvain)
rownames(tab) <- paste("Walktrap", rownames(tab))
colnames(tab) <- paste("louvain", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)

Some ground truth

The paper also provides their annotation of cells. We can read this in and overlay with our umap.

eec_paper_meta <- read.delim("https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/GSE224223_EEC_metadata.csv",
    sep = ";")
nrd1_cells <- as.data.frame(eec_paper_meta[eec_paper_meta$Library == "Nd1", ])

nrd1_cells$X <- gsub("_2$", "", nrd1_cells$X)

newMeta <- merge(colData(sce.NeuroD1_filtered_QCed), nrd1_cells, by.x = "Barcode",
    by.y = "X", all.x = TRUE, all.y = FALSE, order = FALSE)
rownames(newMeta) <- newMeta$Barcode
newMeta$All_Cell_Types[is.na(newMeta$All_Cell_Types)] <- "Filtered"
colData(sce.NeuroD1_filtered_QCed) <- newMeta
plotUMAP(sce.NeuroD1_filtered_QCed, colour_by = "All_Cell_Types")

Some ground truth

We can also see how the annotation overlaps with our clusters.

tab <- table(Walktrap = sce.NeuroD1_filtered_QCed$All_Cell_Types, Louvain = sce.NeuroD1_filtered_QCed$label)
rownames(tab) <- paste("CellType", rownames(tab))
colnames(tab) <- paste("Leiden", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)

Identify markers

Now we have clusters defined we can identify genes which are differentially expressed within a cluster compared to other clusters.

markers.sce.NeuroD1 <- scoreMarkers(sce.NeuroD1_filtered_QCed, sce.NeuroD1_filtered_QCed$label)
chosen <- markers.sce.NeuroD1[["4"]]
ordered <- chosen[order(chosen$mean.AUC, decreasing = TRUE), ]
head(ordered[, 1:4])  # showing basic stats only, for brevity.
## DataFrame with 6 rows and 4 columns
##      self.average other.average self.detected other.detected
##         <numeric>     <numeric>     <numeric>      <numeric>
## Pzp       3.39590     0.0791581      0.975207      0.0501323
## Cck      11.32544     4.6047391      1.000000      0.9882847
## Agr2      6.82305     1.5954380      1.000000      0.5219864
## Scg2      6.57950     1.9450183      1.000000      0.5898348
## Upp1      4.17198     1.3609008      0.995868      0.5891026
## Agr3      4.05125     0.9985682      1.000000      0.4145315

Plot markers

We can then use the plotExpression function to visualise a gene’s expression across clusters.

plotExpression(sce.NeuroD1_filtered_QCed, features = head(rownames(ordered), n = 1),
    x = "label", colour_by = "label")

saveRDS(sce.NeuroD1_filtered_QCed, file = "sce.NeuroD1_filtered_QCed.RDS")

Seurat


Working with CellRanger filtered matrix

We can use Seurat to import our data as before for Bioconductor.

library(Seurat)
h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/filtered_feature_bc_matrix.h5"
local_h5file <- basename(h5file)
download.file(h5file, local_h5file)
Nd1T.mat <- Read10X_h5(filename = local_h5file)
Nd1T.obj <- CreateSeuratObject(Nd1T.mat, project = "Nd1")
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')

The Seurat object

The Seurat object contains now our information on Neurod1 droplets.

Nd1T.obj
## An object of class Seurat 
## 39905 features across 3468 samples within 1 assay 
## Active assay: RNA (39905 features, 0 variable features)
##  1 layer present: counts

The Seurat object

We can access rownames and column names in a similar manner as before.

rownames(Nd1T.obj)[1:2]
## [1] "Gm26206" "Xkr4"
colnames(Nd1T.obj)[1:2]
## [1] "AAACCCACACAATGTC-1" "AAACCCACACCACTGG-1"

The Seurat object

To access metadata we need to directly access the slot

Nd1T.obj@meta.data[1:2, ]
##                    orig.ident nCount_RNA nFeature_RNA
## AAACCCACACAATGTC-1        Nd1      23320         4776
## AAACCCACACCACTGG-1        Nd1       1944          176

The Seurat object

We can also review the active assay and active cell-barcode identifiers.

Nd1T.obj@active.assay
## [1] "RNA"
Nd1T.obj@active.ident[1:2]
## AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 
##                Nd1                Nd1 
## Levels: Nd1

The Seurat object

The get access to the assays we can pull directly from the relevent slots.

Nd1T.obj@assays$RNA$counts[1:2, 1:2]
## 2 x 2 sparse Matrix of class "dgCMatrix"
##         AAACCCACACAATGTC-1 AAACCCACACCACTGG-1
## Gm26206                  .                  .
## Xkr4                     .                  .

Gather QC

As with Biocondcutor we must again identify mitochondrial genes.

mito.genes <- grep("^MT", rownames(Nd1T.obj), value = T)
mito.genes[1:10]
##  [1] "MT-TrnF"    "MT-mt-Rnr1" "MT-TrnV"    "MT-mt-Rnr2" "MT-TrnL1"  
##  [6] "MT-ND1"     "MT-TrnI"    "MT-TrnQ"    "MT-TrnM"    "MT-ND2"

Gather QC

We can the use PercentageFeatureSet function to identify percent of mito reads in each cell and the AddMetaData function to add this to the Seurat object.

percent.mt <- PercentageFeatureSet(Nd1T.obj, features = mito.genes)
Nd1T.obj <- AddMetaData(Nd1T.obj, metadata = percent.mt, col.name = "percent.mt")
head(Nd1T.obj@meta.data)
##                    orig.ident nCount_RNA nFeature_RNA percent.mt
## AAACCCACACAATGTC-1        Nd1      23320         4776  14.399657
## AAACCCACACCACTGG-1        Nd1       1944          176  81.121399
## AAACCCACACGAGAAC-1        Nd1      76762         7564   9.710534
## AAACCCAGTGTCCGGT-1        Nd1      17440         4177  30.928899
## AAACGAAAGCCTCGTG-1        Nd1       9512         2187  29.447014
## AAACGAAAGCCTTGAT-1        Nd1      11320         3181  14.187279

Plot QC

We can then review QC metrics across our droplets.

VlnPlot(Nd1T.obj, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
## Warning: Default search for "data" layer in "RNA" assay yielded no results;
## utilizing "counts" layer instead.

Plot QC

As well as plot QC metrics against each other.

plot1 <- FeatureScatter(Nd1T.obj, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(Nd1T.obj, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot3 <- FeatureScatter(Nd1T.obj, feature1 = "nFeature_RNA", feature2 = "percent.mt")
plot1 + plot2 + plot3

Subset by QC

We use the subset function to remove low quality cells from our object.

Nd1T.obj.filt <- subset(Nd1T.obj, subset = nCount_RNA < 125000 & percent.mt < 25 &
    nFeature_RNA > 200)
Nd1T.obj.filt
## An object of class Seurat 
## 39905 features across 1676 samples within 1 assay 
## Active assay: RNA (39905 features, 0 variable features)
##  1 layer present: counts

Normalise data

We will now normalise our data by scaling to library size and applying a log2 transformation.

Nd1T.obj.filt <- NormalizeData(Nd1T.obj.filt)
## Normalizing layer: counts

Variable features

As with Bioc we will now identify variable features.

Here the mean-variance relationship is accounted for by applying a VST transformation.

Nd1T.obj.filt <- FindVariableFeatures(Nd1T.obj.filt, selection.method = "vst", nfeatures = 3000)
## Finding variable features for layer counts
top10 <- head(VariableFeatures(Nd1T.obj.filt), 10)
top10
##  [1] "Olfm4"  "Iapp"   "H2ac23" "Dmbt1"  "Ube2c"  "Asic5"  "Ifitm3" "Reg3b" 
##  [9] "Hmgb2"  "Reg3g"

Variable features

We can then plot these features using the VariableFeaturePlot function.

plot1 <- VariableFeaturePlot(Nd1T.obj.filt)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
## When using repel, set xnudge and ynudge to 0 for optimal results
plot2
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.

Scaling data

In Bioconductor the scaling of data was done inside the PCA functions.

Here we apply the scaling using the ScaleData function.

all.genes <- rownames(Nd1T.obj.filt)
Nd1T.obj.filt <- ScaleData(Nd1T.obj.filt, features = all.genes)
## Centering and scaling data matrix
Nd1T.obj.filt@assays
## $RNA
## Assay (v5) data with 39905 features for 1676 cells
## Top 10 variable features:
##  Olfm4, Iapp, H2ac23, Dmbt1, Ube2c, Asic5, Ifitm3, Reg3b, Hmgb2, Reg3g 
## Layers:
##  counts, data, scale.data

PCA

Now we have our scaled data we can apply PCA and identify how many PCs to use in downstream analysis.

Nd1T.obj.filt <- RunPCA(Nd1T.obj.filt, features = VariableFeatures(object = Nd1T.obj))
## PC_ 1 
## Positive:  Mir6236, Apoa1, Bnip5, Trpm2, Apoa4, Apob, Ada, Nts, Cyp3a11, Gm53845 
##     Vgf, Klf4, Col27a1, Gm57596, Rfx6, Prss30, Sst, Pmp22, Gm32584, Rasal2 
##     Ly6m, Nrp1, Gm50599, Slc38a11, Adgrd1, Ahnak, Slc26a4, Afp, Slc28a2, Apol7a 
## Negative:  Rps2, Rps8, Rpl12, Rpl27a, Eef1b2, Rplp0, Gvin-ps1, Rpl32, Rps15a, Rpl22 
##     Cps1, Rpl18a, Rpl13, Rps20, Rps16, Rpl35, Rps7, Rps28, Rpl23, Rps3 
##     Rpl28, Rps23, Rps4x, Rpl35a, Rps12, Rps24, Rps10, Rps27a, Nme2, Rpl30 
## PC_ 2 
## Positive:  Rbp4, Scgn, Gm11992, Ptma, Bambi, Etv1, Parm1, Cltrn, Isl1, Qpct 
##     Tmem176a, Kcnk16, Crp, Gm53467, Pzp, Gnai1, Rps3a1, Galnt13, Dner, Rpl6 
##     Rps24, Rfxap, Cpn1, Tril, Cyp2j5, Rpl12, Smbd1, Kcnj6, Agr2, Tff3 
## Negative:  Spink1, Clec2h, Mep1b, Ace2, Guca2b, Slc51a, Apoa1, Ces2e, Ggt1, Lgals3 
##     Alpi, Apoa4, Clca4b, Ifi27l2b, Dpep1, Mogat2, Adh6a, H2-T3, Ace, Treh 
##     Maf, Xdh, Cndp2, Maoa, Fabp2, Apol10a, H2-T26, Mall, Arg2, Dnase1 
## PC_ 3 
## Positive:  Rbp4, Scgn, Isl1, Gm11992, Gsto1, Cyp2j6, Etv1, Bambi, Gpd2, Ugt2b34 
##     S100a11, Abcc8, St3gal4, Cltrn, Parm1, Fabp2, Ghrl, Gnai1, Kcnk16, Pzp 
##     Galnt13, Crp, Higd1a, Kcnj6, Ano6, Acvr1c, Plcxd3, Tril, Slc41a2, Cyp2j5 
## Negative:  Tpbg, Slc38a11, Lmx1a, Rasal2, Ccnd2, Krt19, Afp, Trpa1, Shisa2, Pla2g7 
##     Amigo2, Trpm2, Serpinf2, Lypd8l, Sorl1, Slc5a4b, Gm53845, Cwh43, Rpp25, Grik4 
##     Adh1, 9330154J02Rik, Rab3c, Lonrf1, Fabp3, Apoa5, Nars2, Slc5a9, Ptgr1, Tmem238l 
## PC_ 4 
## Positive:  Top2a, Nusap1, Gvin-ps1, Clca3b, Aurkb, Cenpf, D17H6S56E-5, Ckap2l, Hmmr, Mir6236 
##     H1f5, Cenpe, Hmgb2, Cdca2, Melk, Esco2, Cd200l2, Stmn1, Bub1b, Prc1 
##     Uhrf1, Ckap2, Ncapg, H2ac23, Cdca7, Cdca3, Cenpw, Slfn9, Ccnb2, Cenpm 
## Negative:  Tff3, Agr3, Ttr, Tmem176a, Casp6, Agr2, Uqcr11, Pzp, Uqcrb, Ppia 
##     Ndufa4, Glod5, D930028M14Rik, Homer2, Tle1, Cyp2j6, Cpn1, Onecut3, Uqcrq, Trf 
##     Atp5me, Crp, Qpct, Ndufb10, Lypd8, Cox5b, Tmed6, Cyb5a, Rpl19, Atp5mj 
## PC_ 5 
## Positive:  Agr2, Agr3, Mgll, Phgr1, Onecut3, Serpina1e, Tmsb4x, Pzp, Homer2, Klf4 
##     Tle1, Hspb1, Galnt13, D930028M14Rik, Cit, S100a10, Crp, Cpn1, Gm12511, Ckb 
##     Serpina1b, Lamb3, Kcnk16, Slc43a2, Gm53467, Glod5, Slc41a2, Cyp2j5, Trpm2, Fgl2 
## Negative:  Mrln, Scarb1, Phlda1, Rgs4, Rfx6, Rbp2, 2010204K13Rik, Fam167a, Ass1, 4933407L21Rik 
##     Nherf4, Trpc5, Slfn10-ps, Isl1, Plcxd3, Gamt, Th, Sfrp5, Prkg2, Fmo1 
##     Gm32255, Plxnc1, Cth, Pls3, Ntrk1, Gsto1, Gm54263, Acvr1c, Alpl, Prps1
ElbowPlot(Nd1T.obj.filt, ndims = 50)

UMAP

We can then create a 2D representation of our data using a UMAP .

Nd1T.obj.filt <- RunUMAP(Nd1T.obj.filt, dims = 1:30)
## 13:32:03 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:32:03 Read 1676 rows and found 30 numeric columns
## 13:32:03 Using Annoy for neighbor search, n_neighbors = 30
## 13:32:03 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:32:03 Writing NN index file to temp file /var/folders/0_/8lnwn5sn7hn9nr8qv2jg48nw0000gn/T//RtmpUTDURW/file16eb7b62b3a5
## 13:32:03 Searching Annoy index using 1 thread, search_k = 3000
## 13:32:03 Annoy recall = 100%
## 13:32:05 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:32:06 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:32:06 Commencing optimization for 500 epochs, with 65834 positive edges
## 13:32:09 Optimization finished

UMAP

And we plot this using the FeaturePlot function overlaying the percent.mt.

FeaturePlot(Nd1T.obj.filt, reduction = "umap", features = "percent.mt")

Clustering

We can then perform a louvain clustering using the FindNeighbours and FindClusters functions.

The cluster labels will be stored in active.idents slots.

Nd1T.obj.filt <- FindNeighbors(Nd1T.obj.filt, dims = 1:30)
## Computing nearest neighbor graph
## Computing SNN
Nd1T.obj.filt <- FindClusters(Nd1T.obj.filt, resolution = 0.7)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 1676
## Number of edges: 54373
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8623
## Number of communities: 12
## Elapsed time: 0 seconds
Nd1T.obj.filt@active.ident
## AAACCCACACAATGTC-1 AAACCCACACGAGAAC-1 AAACGAAAGCCTTGAT-1 AAACGAACATGTCAGT-1 
##                  6                  5                  1                  1 
## AAACGAAGTAGACGTG-1 AAACGAAGTAGCTGTT-1 AAACGAAGTGTATACC-1 AAACGCTAGAGTTGAT-1 
##                  6                  6                  4                  2 
## AAACGCTCATCGCTCT-1 AAAGAACCAGACTCTA-1 AAAGAACCATATCGGT-1 AAAGAACCATCCTCAC-1 
##                  1                  5                  1                  6 
## AAAGAACGTACCTAAC-1 AAAGAACGTTGGACCC-1 AAAGAACTCGTGCACG-1 AAAGGATCATGTTCGA-1 
##                  4                  2                  5                  2 
## AAAGGTACAACTTGGT-1 AAAGGTACATAGGCGA-1 AAAGTCCCATGGCACC-1 AAAGTCCTCTTCGATT-1 
##                  2                  0                  5                  0 
## AAAGTGAAGATGACAT-1 AAATGGAAGTGTAGTA-1 AAATGGACATGCTGCG-1 AACAAAGGTCCGGTGT-1 
##                  2                  9                  3                  6 
## AACAAAGTCGCAATTG-1 AACAAGAGTGAGTGAC-1 AACAGGGCATGTCAGT-1 AACCAACAGCCTGTGC-1 
##                  0                  2                  8                  2 
## AACCAACAGGTCATAA-1 AACCAACGTTGCGGAA-1 AACCAACTCCTATGGA-1 AACCACAAGGTGGTTG-1 
##                  0                  1                  5                  3 
## AACCACAAGTAACCTC-1 AACCACACATCCAATG-1 AACCACAGTCCAGGTC-1 AACCACAGTCGACGCT-1 
##                  6                  5                  0                  1 
## AACCCAAAGATGAACT-1 AACCCAATCCAGCAAT-1 AACCTGACACTCTCGT-1 AACCTGAGTGCATGAG-1 
##                  8                  3                  9                  2 
## AACGAAAAGGGACTGT-1 AACGAAACAAGAAATC-1 AACGGGACACTCACTC-1 AACGGGATCTCGCTCA-1 
##                  0                  0                  0                  9 
## AACGTCAAGCGATGGT-1 AACGTCAGTGTCCACG-1 AACGTCATCTGGTCAA-1 AACTTCTTCATCACTT-1 
##                  5                  6                  8                  9 
## AAGAACAAGGGCAGAG-1 AAGACTCAGAAATTGC-1 AAGACTCAGCAGCGAT-1 AAGACTCCAACCGTGC-1 
##                  3                  1                  4                 10 
## AAGACTCCAAGTGCAG-1 AAGACTCTCCCGTAAA-1 AAGACTCTCTAAGGAA-1 AAGATAGAGGGCAATC-1 
##                  1                  9                  4                  0 
## AAGCATCCACATGTTG-1 AAGCATCTCAGATTGC-1 AAGCATCTCTCGCAGG-1 AAGCGTTAGGCCGCTT-1 
##                  2                  0                  2                  3 
## AAGCGTTAGGGCAGTT-1 AAGCGTTCATTCTTCA-1 AAGGAATAGCCTAGGA-1 AAGGTAAGTAATCAAG-1 
##                  2                  3                  0                  1 
## AAGTCGTAGGTTCAGG-1 AAGTCGTTCATCCCGT-1 AAGTCGTTCTGTTGGA-1 AAGTGAAAGCCGGATA-1 
##                  7                  0                  5                  3 
## AAGTGAACACCCTGAG-1 AAGTGAAGTTTGGCTA-1 AATCACGAGAGCGACT-1 AATCACGCAAGCGCTC-1 
##                  4                  8                  4                 11 
## AATCACGCAATGAGCG-1 AATCACGCAGACAAGC-1 AATCGACGTGCGGATA-1 AATCGACTCATCGTAG-1 
##                  0                  4                  0                  3 
## AATCGACTCTAAGGAA-1 AATGACCAGGGTGAAA-1 AATGACCCAGAGTAAT-1 AATGACCGTCTGTAGT-1 
##                  4                  1                  2                  8 
## AATGACCGTGGTCCCA-1 AATGCCACATAACTCG-1 AATGCCATCCGGCTTT-1 AATGGAAAGCGTATGG-1 
##                  9                  1                  7                  1 
## AATGGAAGTCAGTTTG-1 AATTCCTTCTCCCATG-1 AATTTCCGTCATATGC-1 ACAAAGAAGCTACTGT-1 
##                  0                  3                  3                  2 
## ACAAAGACAACTCATG-1 ACAAAGACAGATAAAC-1 ACAAAGAGTACTGCGC-1 ACAACCACAACACTAC-1 
##                  4                  5                  6                  6 
## ACAACCACAATTCGTG-1 ACAACCACACCCAACG-1 ACAACCATCAAGCCAT-1 ACAACCATCCTACAAG-1 
##                  4                  7                 11                  4 
## ACAAGCTAGCCAAGCA-1 ACAAGCTCACAGAGAC-1 ACAAGCTCACCTGATA-1 ACAAGCTCAGCGTACC-1 
##                  0                  9                  0                  0 
## ACAAGCTGTTATTCCT-1 ACAAGCTTCGGTTAGT-1 ACACAGTAGCACGTCC-1 ACACCAATCATCCTGC-1 
##                  2                  6                  8                  1 
## ACACCAATCCACCTGT-1 ACACGCGCAATAGGGC-1 ACACTGAGTGCCAAGA-1 ACAGAAACACCGAATT-1 
##                  5                  0                  0                  5 
## ACAGAAACAGAGGACT-1 ACAGCCGAGGCCGCTT-1 ACAGGGAAGAAACTGT-1 ACAGGGAGTTGAATCC-1 
##                  1                  4                 10                  9 
## ACATCCCAGCAAGCCA-1 ACATCCCAGGCTAGCA-1 ACATCCCCATCAGTGT-1 ACATGCACATCTTTCA-1 
##                  0                  1                  6                  4 
## ACATGCAGTACTGCGC-1 ACATGCAGTCCTGGGT-1 ACATGCAGTGCATACT-1 ACATTTCAGTCACTGT-1 
##                  5                  4                  1                  5 
## ACATTTCGTATTTCTC-1 ACATTTCTCCGCACTT-1 ACATTTCTCGTGTGGC-1 ACCAAACAGACCTCAT-1 
##                  5                  2                  2                  1 
## ACCAACAAGCCGCTTG-1 ACCAACAGTCGGCTAC-1 ACCACAAGTCGAAGCA-1 ACCACAAGTTTGGAAA-1 
##                  6                  0                  8                  2 
## ACCACAATCACAAGGG-1 ACCATTTAGACGACTG-1 ACCATTTAGGTACAGC-1 ACCATTTAGTCGTCTA-1 
##                  7                  1                 11                  2 
## ACCATTTGTAGTCCTA-1 ACCATTTGTATGTCCA-1 ACCATTTGTGACTGTT-1 ACCCTCATCCTCGCAT-1 
##                  7                  1                  0                  2 
## ACCTACCCAGCGTTGC-1 ACCTACCTCAAGCTTG-1 ACCTACCTCGACGACC-1 ACCTACCTCGTTTACT-1 
##                  3                  6                  3                  6 
## ACCTGAAAGCAACAGC-1 ACCTGAACAATTGCGT-1 ACCTGAAGTCTGGTTA-1 ACGATCAGTCCGTTTC-1 
##                  5                  2                  2                 10 
## ACGATGTCAAGTTCGT-1 ACGATGTCATGTGCTA-1 ACGATGTTCGAAATCC-1 ACGATGTTCTATCGCC-1 
##                  0                  2                  0                  5 
## ACGCACGCAGGTTACT-1 ACGGAAGCAAGGCCTC-1 ACGGAAGCAGGTATGG-1 ACGGGTCAGGCTGGAT-1 
##                  6                  1                  7                  3 
## ACGGGTCAGTAGCTCT-1 ACGGGTCGTCACTCAA-1 ACGGTCGAGCGCTGAA-1 ACGGTCGGTATCGATC-1 
##                  7                  0                  3                  9 
## ACGGTTAAGCGTGCCT-1 ACGGTTAAGCTCGAAG-1 ACGGTTAGTCGTAATC-1 ACGTACAAGCAGGTCA-1 
##                 10                  4                  4                  0 
## ACGTACATCGTTGTGA-1 ACGTAGTAGGGCGAGA-1 ACGTAGTCATCACAGT-1 ACGTAGTGTCCTATAG-1 
##                  2                  6                  0                  0 
## ACGTCCTAGCATCAGG-1 ACGTCCTGTGGCTACC-1 ACGTCCTGTTAAGGGC-1 ACGTTCCCACTATGTG-1 
##                  1                  4                  1                  5 
## ACTATCTGTGGAGAAA-1 ACTATGGGTTCTCGCT-1 ACTATTCTCAAACCCA-1 ACTCCCATCTAGCATG-1 
##                  3                  0                  1                  2 
## ACTCTCGGTCGTGGTC-1 ACTCTCGGTGGAGAAA-1 ACTGATGCAAATGGAT-1 ACTGCAAAGGACTGGT-1 
##                  7                  2                  5                  3 
## ACTGCAAAGGTATCTC-1 ACTGTCCCATCTGGGC-1 ACTGTCCGTCGCACGT-1 ACTGTGACACCCTAAA-1 
##                  3                  8                  0                  1 
## ACTGTGATCTAACACG-1 ACTTAGGAGCAGCGAT-1 ACTTAGGAGTGAGTTA-1 ACTTAGGCAAGTGGTG-1 
##                  2                  6                  2                  1 
## ACTTAGGGTCCAGCCA-1 ACTTATCGTATCGCTA-1 ACTTATCGTTAGAAGT-1 ACTTATCTCCTGTAGA-1 
##                  3                  5                  0                  1 
## ACTTCCGGTACTGCGC-1 ACTTCCGTCTCCCATG-1 ACTTCGCCAGGTTCAT-1 ACTTCGCGTAGAGACC-1 
##                  3                  3                  0                  4 
## ACTTTGTAGACGCTCC-1 ACTTTGTAGTAAATGC-1 ACTTTGTGTAGAATGT-1 AGAAATGGTCATCGGC-1 
##                  2                  1                  1                  3 
## AGAAATGGTTCAGGTT-1 AGAACAAAGAGTGTGC-1 AGAACAAAGGGCAATC-1 AGAACCTGTAGACGGT-1 
##                  9                  9                  1                  4 
## AGAACCTGTGCCTTTC-1 AGAACCTTCAGTCTTT-1 AGAAGTAAGAAGCGCT-1 AGAAGTACATGGGCAA-1 
##                  3                  3                  8                  6 
## AGACAAACATCGCCTT-1 AGACAAATCCCGAATA-1 AGACACTGTAACCCTA-1 AGACACTTCCTACCAC-1 
##                  5                  2                  1                  5 
## AGACAGGCATCGTGCG-1 AGACCATAGCGACCCT-1 AGACCCGTCGAACACT-1 AGACCCGTCGTTGTTT-1 
##                  4                  0                  1                  0 
## AGAGAATCAACTGGTT-1 AGAGAATTCCCGTGTT-1 AGAGAGCTCAGATGCT-1 AGAGCAGAGGTTCTTG-1 
##                  1                  2                  4                  0 
## AGAGCAGAGTTTGGCT-1 AGAGCAGCAAGGACAC-1 AGAGCAGCAGCACCCA-1 AGAGCAGGTGATGTAA-1 
##                  1                  4                 11                  8 
## AGAGCCCAGCCACAAG-1 AGATAGAAGGATGTTA-1 AGATCCAAGCCTCGTG-1 AGATCCATCAGCTTCC-1 
##                  0                  0                  0                  1 
## AGATCGTCATTGAGCT-1 AGATCGTGTAGCGAGT-1 AGATGAATCAAACGAA-1 AGATGCTAGCATCAAA-1 
##                  1                  1                  1                  1 
## AGATGCTAGGTGCAGT-1 AGATGCTAGTAGAGTT-1 AGATGCTCATGTCAGT-1 AGATGCTGTACTCGCG-1 
##                  1                  6                  5                  2 
## AGCATCAGTCCTGTTC-1 AGCCAATGTCATTCCC-1 AGCCAGCGTTAGGCTT-1 AGCGCCAAGAGGGCGA-1 
##                  2                  6                  4                  5 
## AGCGCCACATAATCGC-1 AGCGCCAGTATAGCTC-1 AGCGTATCAAATTGGA-1 AGCGTATGTTGCCATA-1 
##                  1                  5                  2                  6 
## AGCGTATTCTGGCCGA-1 AGCGTCGCATATCGGT-1 AGCTACAGTCGGATTT-1 AGCTACATCGTCAAAC-1 
##                 10                  0                  0                  0 
## AGCTCAAAGCAGATAT-1 AGCTCAAAGTCTGTAC-1 AGCTCAACACCTTCGT-1 AGCTCAATCGTTCGCT-1 
##                  1                  7                 10                  6 
## AGCTTCCAGAAGCTCG-1 AGCTTCCAGGTGCCTC-1 AGCTTCCTCAAGATAG-1 AGCTTCCTCGTGTTCC-1 
##                  4                  1                  2                 10 
## AGGAAATAGAAGCTGC-1 AGGAAATTCATTCCTA-1 AGGAATATCTGTAACG-1 AGGACGACATGGGAAC-1 
##                  8                  6                  0                  3 
## AGGACTTAGTCCTACA-1 AGGACTTCAGAGGACT-1 AGGAGGTAGGAGCAAA-1 AGGAGGTGTTATGGTC-1 
##                  0                  4                  7                  0 
## AGGATAAGTGTAGCAG-1 AGGATCTAGACGTCCC-1 AGGATCTCAACGATCT-1 AGGCCACAGTTCATCG-1 
##                  4                  1                 11                  8 
## AGGCCACCATCAGTCA-1 AGGCCACCATGGGTCC-1 AGGGCCTAGTACAACA-1 AGGGCCTAGTTGCATC-1 
##                  6                  0                  6                  3 
## AGGGCCTCAGATTAAG-1 AGGGCCTTCCGTTGAA-1 AGGGCCTTCGTAGGGA-1 AGGGCTCAGGACCCAA-1 
##                  8                  2                  0                  1 
## AGGGCTCCAGCTCCTT-1 AGGGCTCGTAGCGCCT-1 AGGGCTCTCTCGACGG-1 AGGGTCCTCGTTACCC-1 
##                  0                  6                  0                 10 
## AGGGTGAAGCATACTC-1 AGGTAGGCACGTAACT-1 AGGTAGGGTACGTACT-1 AGGTAGGGTGAGCCAA-1 
##                  0                  1                  1                  7 
## AGGTCTATCGATTTCT-1 AGGTTACGTACAGGTG-1 AGTAACCCAGACTGCC-1 AGTACCAGTCGCGTTG-1 
##                  5                  0                  2                  1 
## AGTACTGAGGGTTTCT-1 AGTACTGCACTTGACA-1 AGTACTGGTATGGTAA-1 AGTAGTCGTGAGTGAC-1 
##                  0                  8                  9                  9 
## AGTAGTCGTTGTGTTG-1 AGTCAACAGAGTGTGC-1 AGTCAACCACTAACCA-1 AGTCACAAGCGATGAC-1 
##                  3                  5                  4                 10 
## AGTCATGCAGCACAAG-1 AGTCATGTCTCCCAAC-1 AGTCTCCCATGCCGAC-1 AGTCTCCCATGGACAG-1 
##                  3                  3                  5                  9 
## AGTGATCAGGACGCTA-1 AGTTAGCGTTAGGACG-1 AGTTCCCGTTACCGTA-1 ATACCGATCCGTAGTA-1 
##                  3                  4                  1                  5 
## ATACCTTGTCGATTCA-1 ATACCTTTCTAGAGCT-1 ATACCTTTCTCCCATG-1 ATAGACCCACTCGATA-1 
##                  0                  0                  3                  4 
## ATAGACCGTCATCGCG-1 ATAGACCTCATGCCAA-1 ATAGAGACAAGAATGT-1 ATAGGCTGTAACGCGA-1 
##                  0                  0                  0                  4 
## ATAGGCTTCTAGCCAA-1 ATATCCTCAGGCGATA-1 ATCACAGCACCGTGCA-1 ATCACAGCACGATAGG-1 
##                  1                  0                  3                 11 
## ATCACGAAGGTACAAT-1 ATCACGATCGCTAGCG-1 ATCACTTAGGGTTTCT-1 ATCACTTTCCTAAGTG-1 
##                  5                  0                  0                  1 
## ATCAGGTAGGGTGGGA-1 ATCATTCAGCAGGTCA-1 ATCATTCCATAGACTC-1 ATCATTCGTGACTAAA-1 
##                 10                  2                  3                  0 
## ATCATTCGTTCGTTCC-1 ATCCACCAGAGTTCGG-1 ATCCACCCAGTGGCTC-1 ATCCACCGTCATCCCT-1 
##                  7                  4                  5                  3 
## ATCCACCGTGTGACCC-1 ATCCATTCAAGCGATG-1 ATCCATTGTGCCCGTA-1 ATCCATTTCGGCATAT-1 
##                  3                 11                  4                  2 
## ATCCCTGAGGTAAGGA-1 ATCCCTGCAGCACCCA-1 ATCCCTGCATGTGACT-1 ATCCCTGTCTGAGATC-1 
##                  6                  6                  1                  5 
## ATCCGTCCAAGCTGTT-1 ATCCGTCGTCTGTAGT-1 ATCCGTCGTCTTGGTA-1 ATCCGTCTCTGAATGC-1 
##                  7                  9                  3                  3 
## ATCCTATAGTCGGGAT-1 ATCCTATCAAGTGATA-1 ATCCTATTCGTGCATA-1 ATCGATGAGTTAACAG-1 
##                  0                  2                  1                  0 
## ATCGATGCACCAGACC-1 ATCGATGGTGGGAGAG-1 ATCGCCTCAATGTCAC-1 ATCGGATGTAGTATAG-1 
##                  0                  1                  0                  0 
## ATCGTCCCAAGACCTT-1 ATCGTCCCACCGTACG-1 ATCTCTAGTTTCGTAG-1 ATCTTCACACAATCTG-1 
##                  2                  2                  3                  3 
## ATCTTCACACGCCACA-1 ATCTTCACAGACCCGT-1 ATGAAAGCATTGCTTT-1 ATGAAAGTCAAAGAAC-1 
##                  4                  0                  0                  3 
## ATGAAAGTCAGGGTAG-1 ATGAAAGTCATCTACT-1 ATGAAAGTCTCTTAAC-1 ATGACCAAGCGCGTTC-1 
##                  2                  0                  0                  2 
## ATGACCAAGGAACTCG-1 ATGACCACAAGTGCAG-1 ATGACCACATTCACAG-1 ATGACCAGTCGTTGCG-1 
##                  5                  0                  7                  2 
## ATGACCATCCCGAACG-1 ATGAGGGGTACCTAAC-1 ATGAGTCCAAAGACGC-1 ATGAGTCTCTGTACAG-1 
##                  6                  6                  3                  0 
## ATGATCGTCAAATGAG-1 ATGCCTCAGGTGCTGA-1 ATGCCTCTCACGGGAA-1 ATGCGATTCGTTCTCG-1 
##                  1                  8                  0                 11 
## ATGGATCGTGTAGGAC-1 ATGGATCTCCATCCGT-1 ATGGGAGGTGGTGATG-1 ATGGGTTAGCAACCAG-1 
##                  0                  1                  3                  0 
## ATGGTTGGTCAACACT-1 ATGGTTGGTTATCTGG-1 ATGGTTGTCAAGCTGT-1 ATGGTTGTCCCGATCT-1 
##                  1                  5                  5                  2 
## ATGGTTGTCCGCTAGG-1 ATGTCTTCACGGTCTG-1 ATGTCTTCACTGATTG-1 ATGTCTTCATCGCTAA-1 
##                  2                  4                  6                  9 
## ATGTCTTGTCATTCCC-1 ATTACTCTCCGGCTTT-1 ATTATCCAGACATAGT-1 ATTATCCGTATCGATC-1 
##                  7                  0                  0                  0 
## ATTATCCTCCCAAGTA-1 ATTCATCAGGGTAGCT-1 ATTCCATTCCGTCCTA-1 ATTCCTATCTAGGCCG-1 
##                  0                  3                  2                  0 
## ATTCGTTCAGAGTAAT-1 ATTCGTTTCGAGAATA-1 ATTCTACAGCTCGAAG-1 ATTCTACTCACCGGGT-1 
##                  1                  0                  5                  1 
## ATTCTTGAGGCAGGGA-1 ATTCTTGTCAACCTCC-1 ATTGGGTTCGTAGGGA-1 ATTGTTCCATCGAAGG-1 
##                  8                  6                  1                  9 
## ATTGTTCGTCCACGCA-1 ATTGTTCTCACCGACG-1 ATTGTTCTCTCCATAT-1 ATTTACCAGGGCGAGA-1 
##                  5                  8                  3                  8 
## ATTTACCAGTACTGTC-1 ATTTACCGTTGCATTG-1 ATTTCACAGTCCCAGC-1 ATTTCACGTTGGTAGG-1 
##                  2                  0                  1                  5 
## ATTTCTGAGAGCACTG-1 CAAAGAAGTGCCTACG-1 CAAAGAATCGTAGTGT-1 CAACAACAGTAGAGTT-1 
##                  8                  8                  5                  5 
## CAACAACTCTTACCAT-1 CAACAGTTCTCCCATG-1 CAACCAAAGATCCAAA-1 CAACCAAAGTGCACTT-1 
##                  8                  3                  6                  7 
## CAACCTCAGTGCAACG-1 CAACCTCGTATTTCGG-1 CAACCTCGTTGTCAGT-1 CAACGATAGATTGTGA-1 
##                  7                 11                  4                  1 
## CAACGATCAACGTATC-1 CAACGATTCCACTGGG-1 CAACGATTCCCGTGAG-1 CAACGATTCCGAAGGA-1 
##                  8                  5                  1                  2 
## CAACGGCGTAACACCT-1 CAAGACTCACTGAGTT-1 CAAGAGGAGCTACAAA-1 CAAGAGGCAACACGAG-1 
##                  3                  0                  5                  0 
## CAAGAGGGTGGGACAT-1 CAAGCTAAGACGGAAA-1 CAAGCTAGTCTCCTGT-1 CAAGCTAGTGCATGTT-1 
##                  4                  1                  8                  2 
## CAAGCTATCCTTCGAC-1 CAAGGGAAGAGTTGCG-1 CAAGGGACACCGGAAA-1 CAAGGGACACGTGAGA-1 
##                  0                  7                  0                  5 
## CAAGGGAGTAAGAACT-1 CAAGGGATCTTCGTAT-1 CAATACGTCACACCCT-1 CAATCGACAATTGAAG-1 
##                  1                  4                  1                  3 
## CAATGACAGGTTGCCC-1 CAATGACCAAATAAGC-1 CAATGACCATATTCGG-1 CACAACACATGACTCA-1 
##                  8                  3                  4                  1 
## CACACAACACCCTATC-1 CACAGATTCCTAACAG-1 CACAGGCTCTGTCCGT-1 CACATGAAGAGGATGA-1 
##                  2                  4                  2                  4 
## CACATGAGTGCGTTTA-1 CACCAAAGTAATCAGA-1 CACCGTTCATTAGGCT-1 CACCGTTGTCCAGGTC-1 
##                  1                  6                  3                  1 
## CACCGTTTCACTCACC-1 CACCGTTTCTAGACCA-1 CACTGGGAGCTACTGT-1 CACTGGGGTAATGTGA-1 
##                  3                  7                  3                  4 
## CACTGTCCATTAGGAA-1 CACTTCGGTAAGTTGA-1 CACTTCGTCAGCATTG-1 CAGATACAGGAACGAA-1 
##                  8                  2                  3                  2 
## CAGATACGTAGGAGTC-1 CAGATACGTGCACATT-1 CAGATCAAGTTCACTG-1 CAGATCACAAGGTACG-1 
##                  2                  0                  1                  6 
## CAGATTGAGCGCCATC-1 CAGCAATGTGAGCAGT-1 CAGCCAGAGCTGGAGT-1 CAGCCAGCACATCCCT-1 
##                  0                  9                  2                  4 
## CAGCCAGGTTCAAGGG-1 CAGCCAGTCAGTGGGA-1 CAGCGTGAGCTTGTGT-1 CAGCGTGCACCCATAA-1 
##                  5                  7                 10                  4 
## CAGCGTGGTTGCATGT-1 CAGCGTGGTTGCTCGG-1 CAGGCCACACGAAGAC-1 CAGGCCACAGGTCAAG-1 
##                  0                 11                  1                  2 
## CAGGGCTAGAGAGCAA-1 CAGGGCTTCATTCACT-1 CAGGTATCATGGCCCA-1 CAGTGCGCACATACGT-1 
##                  4                  3                  4                  0 
## CAGTGCGGTCATTCCC-1 CAGTTAGGTAATCAGA-1 CAGTTCCCACCTGAAT-1 CAGTTCCGTGCATGTT-1 
##                  6                  2                  0                  3 
## CAGTTCCGTGTTTACG-1 CATAAGCGTATGGAAT-1 CATACAGGTTGGAGAC-1 CATACTTCAAGAGATT-1 
##                  0                  9                  4                  0 
## CATACTTGTCACCGCA-1 CATACTTGTCCCTGTT-1 CATAGACAGGGTGGGA-1 CATAGACGTAGATCGG-1 
##                 10                  3                  1                  6 
## CATAGACGTCAATGGG-1 CATCAAGCATAGATGA-1 CATCAAGTCGCTATTT-1 CATCCACAGAAGCGCT-1 
##                  0                  8                  0                  0 
## CATCCACAGCGTGCTC-1 CATCCGTGTAGGAGGG-1 CATCGCTCAGGGATAC-1 CATCGGGAGAGCAAGA-1 
##                  1                  5                 10                  9 
## CATCGGGTCTCGCGTT-1 CATCGTCAGAAGCGCT-1 CATCGTCTCCGGCAGT-1 CATGAGTAGTTCGGTT-1 
##                  0                  3                  1                  7 
## CATGAGTTCTTCCGTG-1 CATGCAATCTCGCCTA-1 CATGCCTCACTTGAGT-1 CATGCCTCAGTGGTGA-1 
##                  4                  4                  1                  3 
## CATGCCTTCATCACTT-1 CATGCGGTCTAGCAAC-1 CATGCTCGTTGTACGT-1 CATGCTCTCTGAGCAT-1 
##                  5                  4                  7                  0 
## CATGGATAGAGTCTGG-1 CATGGATAGGTAGACC-1 CATGGATCATCTTCGC-1 CATGGATGTAGATTGA-1 
##                  0                  0                  0                  1 
## CATGGATTCCAGTTCC-1 CATGGTACAAATCAAG-1 CATTCTAGTAGAATAC-1 CATTGAGTCCCGAGTG-1 
##                  2                  7                  1                  6 
## CATTGCCCAGATGCGA-1 CATTGTTAGCTTTGTG-1 CATTGTTGTTGCTAGT-1 CATTTCAGTAATGATG-1 
##                  0                  2                  5                  4 
## CATTTCAGTCGAAGCA-1 CCAAGCGGTAGCCCTG-1 CCAATGAAGATGCGAC-1 CCAATGAAGGAACTAT-1 
##                  4                  3                  5                 10 
## CCAATGATCTGCTGAA-1 CCAATTTGTGTCGCTG-1 CCACAAACATGCGGTC-1 CCACACTGTATCGAGG-1 
##                  1                  7                  7                  1 
## CCACCATAGAATTTGG-1 CCACCATAGTATAACG-1 CCACGAGTCGCCTATC-1 CCACGAGTCTTAGCCC-1 
##                  6                  0                  2                  5 
## CCACGTTTCCATCAGA-1 CCACGTTTCTATCCAT-1 CCACTTGAGGGTACAC-1 CCACTTGTCACTAGCA-1 
##                  0                  0                  1                  1 
## CCACTTGTCTGAGATC-1 CCATCACCAACCGATT-1 CCCAACTTCCCGAAAT-1 CCCGAAGAGACAACAT-1 
##                  1                  1                  0                  2 
## CCCGAAGAGTGAGTGC-1 CCCGAAGTCCGATGCG-1 CCCGGAACAACTAGAA-1 CCCGGAACATGTGGTT-1 
##                  4                 10                  3                  1 
## CCCTAACAGGATCACG-1 CCCTAACCACAGTCAT-1 CCCTAACCATTAAAGG-1 CCCTAACGTGTTCAGT-1 
##                  2                  1                  9                  7 
## CCCTAACTCATCTACT-1 CCCTAACTCTGCGGAC-1 CCCTCAAGTGACACGA-1 CCCTCAATCCCATAAG-1 
##                  7                  6                  8                  2 
## CCCTCAATCCCTTCCC-1 CCCTCTCGTGAGATAT-1 CCCTTAGGTCTCTCTG-1 CCCTTAGTCATGGCCG-1 
##                  5                 10                  3                  0 
## CCCTTAGTCGTTCATT-1 CCGAACGGTAAGCTCT-1 CCGATCTCACGCGTCA-1 CCGATCTTCAGAGTTC-1 
##                  7                  0                  3                  4 
## CCGATGGGTTGGGCCT-1 CCGATGGTCCGGGACT-1 CCGCAAGCAGACCTGC-1 CCGCAAGGTGTTGATC-1 
##                  5                  4                  3                  3 
## CCGGACAAGGTTTGAA-1 CCGGGTAAGTGTAGAT-1 CCGGTGAAGGCCACCT-1 CCGGTGACACAACCGC-1 
##                  9                  9                  3                  7 
## CCGGTGAGTCAAAGAT-1 CCGTGAGAGATTACCC-1 CCGTGAGCAAGCCTGC-1 CCGTGAGCATCCGCGA-1 
##                  2                  7                  1                  2 
## CCGTTCATCAGATTGC-1 CCTAACCTCACTACGA-1 CCTAAGAAGAGGTCGT-1 CCTAAGACAGTAGGAC-1 
##                  2                  2                  8                  1 
## CCTAAGACAGTGTGGA-1 CCTAAGACATGGCCCA-1 CCTAAGAGTATTTCCT-1 CCTACGTAGCGAGGAG-1 
##                  2                  2                  9                  0 
## CCTACGTAGTGGCCTC-1 CCTACGTCACGCGTCA-1 CCTACGTTCAAACTGC-1 CCTATCGAGTATTGCC-1 
##                  2                  2                  0                  1 
## CCTATCGCAGTTGAAA-1 CCTCAACAGATTGGGC-1 CCTCAACTCGCACGAC-1 CCTCAACTCTCTGCTG-1 
##                  7                  0                  5                  0 
## CCTCAGTCATCCTATT-1 CCTCAGTCATCGGATT-1 CCTCAGTCATGGGAAC-1 CCTCATGCACTTCATT-1 
##                  2                  5                  4                  1 
## CCTCATGTCGTTCCCA-1 CCTCCAACAAGCTCTA-1 CCTCCAACACTGCACG-1 CCTCCAAGTGCCTGCA-1 
##                  5                 11                  0                  0 
## CCTCCTCTCACTTGTT-1 CCTCCTCTCATCCTAT-1 CCTCTAGAGAGCCATG-1 CCTCTAGCATGCAGGA-1 
##                  0                 10                  0                  8 
## CCTCTCCGTGACCGTC-1 CCTCTCCTCGTGGGAA-1 CCTGCATAGCTAAACA-1 CCTGCATCAGTCGTTA-1 
##                  1                  5                  1                  1 
## CCTGCATGTCAGACGA-1 CCTGCATTCGATTCCC-1 CCTGCATTCTCCAATT-1 CCTTCAGCAGCTACTA-1 
##                 10                  3                  2                  6 
## CCTTGTGAGCGTGCTC-1 CCTTGTGCAAGCAGGT-1 CCTTGTGCAGATCATC-1 CCTTGTGGTCGCATGC-1 
##                  9                  5                  0                  1 
## CCTTGTGTCACCTGGG-1 CCTTGTGTCCACGTCT-1 CCTTGTGTCGTTCCTG-1 CCTTTGGAGAGCAGCT-1 
##                  3                  6                  0                  0 
## CCTTTGGAGGGCTTCC-1 CCTTTGGGTCACAGTT-1 CCTTTGGTCAGCTTCC-1 CGAAGGAAGGATATAC-1 
##                  7                  2                  0                  1 
## CGAAGGATCGAATGCT-1 CGAATTGAGTGCTCAT-1 CGAATTGCATTCTCTA-1 CGAGAAGGTGACAGCA-1 
##                  6                  7                  6                  0 
## CGAGGAAAGGCAGGTT-1 CGAGGAAGTAGTGCGA-1 CGAGGCTGTCCGTACG-1 CGAGTGCAGCGCTTCG-1 
##                  0                  0                  2                  3 
## CGAGTGCCAACTAGAA-1 CGAGTGCGTTCGGTTA-1 CGAGTTAAGCGACCCT-1 CGAGTTAAGCGCACAA-1 
##                  2                  2                  9                  2 
## CGAGTTAGTGTGTCCG-1 CGAGTTATCATGCCCT-1 CGAGTTATCTGTCGTC-1 CGATCGGAGATGCGAC-1 
##                  1                  1                  6                  8 
## CGATCGGTCCCTCAAC-1 CGATCGGTCGGACTTA-1 CGATGCGTCTTAGCAG-1 CGATGGCAGGTCTACT-1 
##                  3                  0                  2                  3 
## CGATGGCCACTGGCGT-1 CGCAGGTCACTGGCGT-1 CGCATAAGTTCTCCCA-1 CGCATAATCTCCCATG-1 
##                  3                  1                  6                  3 
## CGCATGGAGATGTTAG-1 CGCATGGAGCCTAGGA-1 CGCCAGATCGGTGAAG-1 CGCCATTGTTGGATCT-1 
##                  5                  8                  0                  4 
## CGGAACCCAGGCTTGC-1 CGGAACCGTGAGTAAT-1 CGGAACCTCTTAATCC-1 CGGAATTTCTCCTACG-1 
##                  5                  6                  2                  3 
## CGGAGAAAGCGCCTTG-1 CGGAGAACAGGACATG-1 CGGGACTTCAAATAGG-1 CGGGCATCACCACATA-1 
##                  2                  0                  0                  0 
## CGGGTCATCGTTCTCG-1 CGGGTGTAGTTGAATG-1 CGGTCAGAGTCGGCAA-1 CGGTCAGCAGAGCGTA-1 
##                  4                  0                  4                 10 
## CGGTCAGTCATGGCCG-1 CGTAAGTTCAATCTTC-1 CGTAATGGTTAGGCCC-1 CGTAATGTCGGCATCG-1 
##                 10                  6                  4                  3 
## CGTAGTACAGTTAGGG-1 CGTCAAACAGTCAACT-1 CGTCAAAGTCATAGTC-1 CGTCCATAGAACCGCA-1 
##                  0                  3                  1                 10 
## CGTCCATAGGCATGCA-1 CGTCCATAGTCAGCCC-1 CGTCCATTCACTACGA-1 CGTGAATGTCCATACA-1 
##                  3                  2                  1                  5 
## CGTGCTTCACTGGAAG-1 CGTGCTTGTGTCCACG-1 CGTGCTTGTTGGCTAT-1 CGTGTCTCAAAGCTAA-1 
##                  0                  1                  8                  0 
## CGTGTCTCAGGTGGAT-1 CGTGTCTCATCTCAAG-1 CGTGTCTGTCACAATC-1 CGTGTCTTCGGCTTGG-1 
##                  0                  6                  9                  7 
## CGTTAGATCGAGTCTA-1 CGTTCTGCAGAGTTGG-1 CGTTCTGGTCAGTCCG-1 CGTTCTGGTGGCTCTG-1 
##                  5                  8                  5                  1 
## CGTTGGGAGCCGATCC-1 CGTTGGGGTGACCTGC-1 CGTTGGGTCAGATTGC-1 CTAACCCAGTCATTGC-1 
##                  2                  9                  3                  6 
## CTAACCCCACGTTGGC-1 CTAACTTAGGGAACAA-1 CTAACTTAGGTATAGT-1 CTAAGTGAGGTAGCAC-1 
##                  1                  3                  4                  1 
## CTAAGTGCAGCGCTTG-1 CTAAGTGGTCATGCAT-1 CTAAGTGGTCCTGTTC-1 CTAAGTGGTGGCAACA-1 
##                  5                  9                  3                  0 
## CTAAGTGTCGCCAACG-1 CTAAGTGTCGTTGTTT-1 CTACAGAAGTAAGACT-1 CTACAGAGTCTACACA-1 
##                  8                  0                  7                  3 
## CTACAGATCAGCACCG-1 CTACATTTCTCTAAGG-1 CTACCTGAGTTGGGAC-1 CTACGGGCAACCGTGC-1 
##                  0                  2                  1                  2 
## CTACGGGCATGGACAG-1 CTAGACACATCCTTCG-1 CTAGGTAAGTGAGTTA-1 CTAGGTAGTGATTAGA-1 
##                  1                  2                  2                  1 
## CTATAGGCAGCACAGA-1 CTATAGGGTCAGCGTC-1 CTATAGGGTCCCACGA-1 CTATAGGGTTTCGACA-1 
##                  0                  1                  0                  3 
## CTATCCGTCGTCTAAG-1 CTCAAGATCAAGCTGT-1 CTCAATTAGATAGCAT-1 CTCAATTGTAGACTGG-1 
##                  4                  0                  5                  0 
## CTCAATTTCGTTCTGC-1 CTCACTGCAGTCTTCC-1 CTCACTGTCATCGCTC-1 CTCACTGTCTGACGCG-1 
##                  0                  0                  0                  3 
## CTCAGAAGTAATGCTC-1 CTCAGAATCCGTTGGG-1 CTCAGGGAGATAGCAT-1 CTCAGGGGTACCCAGC-1 
##                  7                  3                  3                  5 
## CTCAGTCCAACCGCTG-1 CTCATCGCAACATACC-1 CTCATCGCAGCAGTGA-1 CTCATCGGTCGTGCCA-1 
##                  1                  5                  0                  0 
## CTCATCGGTTGAATCC-1 CTCATGCGTTAATCGC-1 CTCATTAAGTTTGTCG-1 CTCATTACAACCACGC-1 
##                  0                  0                  0                  7 
## CTCATTAGTTCATCTT-1 CTCATTATCAGTAGGG-1 CTCATTATCGGCATTA-1 CTCCAACCACACCTAA-1 
##                  1                  3                  0                  1 
## CTCCACATCTCCCATG-1 CTCCATGAGCGTGAAC-1 CTCCCAAGTACTAACC-1 CTCCCAATCTTCGGTC-1 
##                  3                  3                  0                  3 
## CTCCCTCAGATCCGAG-1 CTCCCTCCAACTTCTT-1 CTCCCTCCATCTCAAG-1 CTCCGATAGTACCATC-1 
##                  0                  1                  2                  7 
## CTCCGATGTAGGGAGG-1 CTCCTCCAGGTTAAAC-1 CTCCTCCCATGACGAG-1 CTCCTTTAGCACCGTC-1 
##                 11                  0                  1                  0 
## CTCCTTTCACGGCGTT-1 CTCCTTTGTTTACCTT-1 CTCCTTTTCCTACGGG-1 CTCGAGGCACGTGTGC-1 
##                  5                  2                  7                  0 
## CTCGAGGGTCGCAGTC-1 CTCTCAGAGTTTGCTG-1 CTCTCGACAGAACTTC-1 CTCTCGAGTCGTGATT-1 
##                  1                  0                  4                  0 
## CTCTCGAGTGACTAAA-1 CTCTGGTAGCCAGAGT-1 CTCTGGTCAGAAACCG-1 CTCTGGTGTGCGTTTA-1 
##                  2                  1                  3                  2 
## CTCTGGTTCGATGCAT-1 CTGAATGCACGACCTG-1 CTGAATGCACGCAAAG-1 CTGAGGCGTGTTCGTA-1 
##                  0                  0                  0                  0 
## CTGCAGGAGTCTGCAT-1 CTGCAGGGTATGTCAC-1 CTGCATCAGACATAGT-1 CTGCATCTCCCTCTTT-1 
##                  3                  2                  4                  3 
## CTGCCATCACGTCATA-1 CTGCCATGTGTCTTAG-1 CTGCCTAAGCTCATAC-1 CTGCCTAAGTTGAAAC-1 
##                  3                  5                  7                  2 
## CTGCCTACAGGTCAAG-1 CTGCCTATCTCATAGG-1 CTGCGAGAGGGCCTCT-1 CTGCGAGAGGTCACTT-1 
##                  1                  4                 11                  0 
## CTGCGAGAGTGGAAAG-1 CTGGACGAGTCTCGTA-1 CTGGCAGGTAGCACGA-1 CTGGCAGTCTCCCATG-1 
##                  1                  3                  0                  3 
## CTGGTCTTCCCAGGCA-1 CTGTACCAGCGCGTTC-1 CTGTACCGTTTCTTAC-1 CTGTAGAAGCACTCCG-1 
##                  9                  0                  0                  8 
## CTGTAGAAGGTAGACC-1 CTGTAGAAGGTGCATG-1 CTGTATTCATTCACCC-1 CTGTCGTCAAGAGTAT-1 
##                  5                  0                  1                  8 
## CTGTCGTCATATGCGT-1 CTGTCGTGTTGCGGCT-1 CTGTCGTGTTTGACAC-1 CTGTGAAAGGAGTATT-1 
##                  0                  0                  0                 11 
## CTGTGAATCTGGCTGG-1 CTGTGGGTCACCGCTT-1 CTTACCGCAAGACCGA-1 CTTACCGCATACATCG-1 
##                 11                  7                  6                  4 
## CTTAGGAAGTGGCCTC-1 CTTAGGAGTATACGGG-1 CTTAGGAGTCATCCGG-1 CTTAGGATCCCATAAG-1 
##                 10                  4                  9                  5 
## CTTCCGAAGCGGACAT-1 CTTCCTTAGCCAAGGT-1 CTTCCTTTCAGCATTG-1 CTTCGGTAGACCCGCT-1 
##                  5                  3                  3                  1 
## CTTCGGTCAGCTGTCG-1 CTTCGGTTCCCTTGTG-1 CTTCTAAAGGCTGTAG-1 CTTCTAATCACCCTGT-1 
##                  0                  1                  1                  1 
## CTTCTAATCGACTCCT-1 CTTCTCTAGTAGCATA-1 CTTCTCTGTAAGTAGT-1 CTTCTCTTCATGGCCG-1 
##                  5                  7                  3                  4 
## CTTGATTAGAGGCGTT-1 CTTGATTGTGCGCTCA-1 CTTTCAAGTCTGTCAA-1 GAAACCTAGAGAGAAC-1 
##                  0                  0                  2                 10 
## GAAACCTGTGTTGACT-1 GAAACCTTCAGGCGAA-1 GAAACCTTCGCAATTG-1 GAAATGAGTGCCTGAC-1 
##                  6                  2                  0                  4 
## GAAATGAGTTAGGCCC-1 GAACACTAGTGCCTCG-1 GAACACTCAGCTGTTA-1 GAACGTTAGGGATCGT-1 
##                  6                  0                  2                  4 
## GAACGTTGTCGCATTA-1 GAACTGTAGACTCCGC-1 GAACTGTCAATAGTGA-1 GAACTGTGTGTGAATA-1 
##                 11                  2                  2                  0 
## GAAGAATTCGGATAAA-1 GAAGCCCGTGACTAAA-1 GAAGCCCTCCGAACGC-1 GAAGCGAAGGTTCTTG-1 
##                  4                  1                  4                  1 
## GAAGCGAGTACTGCGC-1 GAAGGACCATGTGGTT-1 GAAGTAACATTACGGT-1 GAAGTAAGTCTGTAGT-1 
##                  3                  0                  1                  3 
## GAATAGAGTGGAATGC-1 GAATCACCAACAGCTT-1 GAATCACCATGACTCA-1 GAATCGTAGAACTGAT-1 
##                  0                  7                  0                  4 
## GAATCGTTCCGTCACT-1 GACACGCCAGCATGCC-1 GACACGCCATTGACAC-1 GACAGCCGTGTTCATG-1 
##                  0                  3                  0                  1 
## GACATCATCCAACCGG-1 GACCAATAGTAAACAC-1 GACCAATCACCCTAAA-1 GACCAATTCGCACTCT-1 
##                  3                  0                  3                  0 
## GACCCAGAGTTCCGTA-1 GACCCAGTCGTACACA-1 GACCGTGGTAGATTGA-1 GACCGTGGTAGCTGTT-1 
##                  0                  5                  2                  2 
## GACCGTGTCAGTCATG-1 GACCGTGTCTACCTTA-1 GACGCTGGTGGTTCTA-1 GACGCTGGTTCGTGCG-1 
##                  4                  3                  0                  4 
## GACGTTAAGCTGAGCA-1 GACTATGCAACAGCTT-1 GACTATGCACAACGAG-1 GACTATGCACAGCCTG-1 
##                  8                  3                  1                  0 
## GACTCTCGTAGAGTTA-1 GAGAAATCAGCGATTT-1 GAGAAATTCCAAACCA-1 GAGACCCGTACAGTTC-1 
##                  0                 10                  0                  1 
## GAGAGGTCAGGCGTTC-1 GAGATGGTCGCCGATG-1 GAGCCTGTCGCCTATC-1 GAGCTGCCACTAGGCC-1 
##                  1                  2                  1                  8 
## GAGCTGCTCCGTGCGA-1 GAGGCAACAAGTGGCA-1 GAGGCAACATCCTATT-1 GAGGCCTTCCGCAACG-1 
##                  3                  0                  1                  3 
## GAGGGATAGGTGATAT-1 GAGGGATCACTGGATT-1 GAGGGATGTCTAACTG-1 GAGGGTACATGGATCT-1 
##                  1                  9                  2                  6 
## GAGGGTAGTGTTGACT-1 GAGGGTAGTTAGGGAC-1 GAGTCTACAGCGCTTG-1 GAGTGAGGTGGCACTC-1 
##                  3                  5                  2                  5 
## GAGTGAGTCTTACGGA-1 GAGTGTTTCATTGCCC-1 GAGTGTTTCATTGCGA-1 GAGTTACTCATACGAC-1 
##                  1                  0                  9                  9 
## GAGTTACTCGTAGGAG-1 GAGTTGTAGGCTCACC-1 GAGTTGTCAAGTATAG-1 GAGTTGTCACCCAACG-1 
##                  8                  2                  3                  5 
## GAGTTGTGTCTGCATA-1 GAGTTGTTCGCCGAAC-1 GAGTTTGCACAATGAA-1 GAGTTTGGTAGTCGGA-1 
##                  6                  2                  0                  0 
## GAGTTTGTCGACATCA-1 GATAGAAGTAGAGGAA-1 GATCACATCCTCAGGG-1 GATCACATCTGAACGT-1 
##                  8                  9                  9                 10 
## GATCAGTCATCACGGC-1 GATCCCTAGTCACTCA-1 GATCCCTCAATCAAGA-1 GATCGTAAGTAGCCAG-1 
##                  4                  3                  2                  3 
## GATCGTATCTAATTCC-1 GATGAGGCACAAGGTG-1 GATGAGGGTAATCAGA-1 GATGAGGTCATGTCTT-1 
##                  4                  1                  7                  4 
## GATGAGGTCTGAGTCA-1 GATGATCCATATCTCT-1 GATGATCGTTAGGAGC-1 GATGATCTCTCCCATG-1 
##                  2                  2                  2                  3 
## GATGCTAAGAAGCGAA-1 GATGCTAGTCTCGGGT-1 GATGCTATCGCTGTCT-1 GATGGAGCACAACGCC-1 
##                  8                  5                  4                  9 
## GATGGAGTCGTTAGAC-1 GATGTTGCACGATTCA-1 GATTCGACAGGCGAAT-1 GATTCGATCCCTGGTT-1 
##                  7                  0                  0                  4 
## GATTCGATCTTACCAT-1 GATTCTTGTTAGCTAC-1 GATTCTTTCCGAGTGC-1 GATTCTTTCCTTCACG-1 
##                  3                  7                  5                  1 
## GATTGGTGTAGTTCCA-1 GATTGGTTCTCCCTAG-1 GATTTCTAGTCGAAAT-1 GATTTCTCAATAGAGT-1 
##                  1                 11                  5                  7 
## GATTTCTTCCCATGGG-1 GCAACATAGGTAAAGG-1 GCAACATCAAATAGCA-1 GCACATATCCAGTTCC-1 
##                  3                  0                  2                  0 
## GCACGGTCACGAAAGC-1 GCACGGTCAGGAGACT-1 GCACGTGCAAGTGCTT-1 GCACGTGTCTCTGACC-1 
##                  1                  4                  0                  8 
## GCACGTGTCTGTCTCG-1 GCACGTGTCTTGGATG-1 GCACTAAGTCGTCTCT-1 GCACTAAGTGTGCTTA-1 
##                  2                  3                  5                  5 
## GCACTAAGTTCGGTAT-1 GCAGCCAAGTCGAAAT-1 GCAGCCACAGCCCAGT-1 GCAGCTGCAGCGAACA-1 
##                  0                  8                  0                  1 
## GCAGCTGGTGCCCAGT-1 GCAGGCTGTCGGCCTA-1 GCAGGCTGTTAGAAAC-1 GCAGTTATCTGCACCT-1 
##                  1                  3                  3                  5 
## GCATCGGAGCTAGAAT-1 GCATCGGCAGTAGGAC-1 GCATCGGGTAACATGA-1 GCATCGGTCAGGAAAT-1 
##                 10                  5                  3                  8 
## GCATCGGTCTGCGAGC-1 GCATCTCCATCTATCT-1 GCATCTCGTCTACAAC-1 GCATGATCATTACTCT-1 
##                  0                  2                  1                  1 
## GCATGATGTACATACC-1 GCATTAGCAAGACGAC-1 GCATTAGGTCTGGTTA-1 GCATTAGTCGGCATAT-1 
##                  0                  6                  5                  0 
## GCCAACGTCTCTCGCA-1 GCCAGGTGTTTCCAAG-1 GCCAGTGAGGCACTAG-1 GCCAGTGAGTGCCTCG-1 
##                 10                  4                 11                  3 
## GCCATTCAGATGACCG-1 GCCATTCCAACAGAGC-1 GCCATTCCATTCAGGT-1 GCCATTCTCAGGCGAA-1 
##                  4                  8                  1                  0 
## GCCCAGAAGCGGGTAT-1 GCCCAGAGTGCTTATG-1 GCCCAGATCAGGGTAG-1 GCCCAGATCCTGTAGA-1 
##                  0                  2                  1                  1 
## GCCCGAACATACCACA-1 GCCCGAAGTTGTGCCG-1 GCCCGAATCTATTTCG-1 GCCGTGATCCACGTCT-1 
##                  9                  1                  2                  6 
## GCCTGTTCACGGAAGT-1 GCGAGAATCCCTTGTG-1 GCGGAAACATGGGTCC-1 GCGGAAATCAGACCTA-1 
##                  4                  1                  7                  2 
## GCGGATCTCTCCCATG-1 GCGGATCTCTGCACCT-1 GCGTGCAAGCCAAGGT-1 GCGTGCATCAACGCTA-1 
##                  3                  6                  5                  0 
## GCGTTTCGTGGTAATA-1 GCTACAAAGAACTGAT-1 GCTACCTCAAGTATAG-1 GCTACCTCATCGATCA-1 
##                  8                  6                  0                  2 
## GCTACCTTCCCTCATG-1 GCTCAAATCACACCCT-1 GCTCAAATCAGTGTTG-1 GCTCAAATCCGTGGTG-1 
##                  6                  3                  3                  1 
## GCTGAATAGATTAGTG-1 GCTGAATGTAGTTACC-1 GCTGCAGCATCATCCC-1 GCTGCAGGTCTTGTCC-1 
##                  3                 10                  6                  5 
## GCTGCAGTCCCAGGAC-1 GCTGGGTGTCAACGCC-1 GCTGGGTTCAGTGTGT-1 GCTTCACAGCTTAAGA-1 
##                  3                  0                  4                  5 
## GCTTCACCATTGTGCA-1 GCTTGGGGTATAGGGC-1 GCTTGGGGTCTAACGT-1 GCTTGGGGTGATGAAT-1 
##                  8                  3                  0                  2 
##  [ reached getOption("max.print") -- omitted 676 entries ]
## Levels: 0 1 2 3 4 5 6 7 8 9 10 11

Clustering

We then visualise the clustering using the Dimplot fucntion which will use the active.ident slot by default.

DimPlot(Nd1T.obj.filt, reduction = "umap")

Compare clustering

We can quickly compare the clusters from Bioconductor and Seurat.

bioc.clusters <- data.frame(Bioc = sce.NeuroD1_filtered_QCed$label, row.names = sce.NeuroD1_filtered_QCed$Barcode)
seurat.clusters <- as.data.frame(Nd1T.obj.filt@active.ident)
colnames(seurat.clusters) <- "Seurat"
clustersAll <- merge(bioc.clusters, seurat.clusters, by = 0, all = FALSE)

Compare clustering

tab <- table(Bioc = clustersAll$Bioc, Seurat = clustersAll$Seurat)
rownames(tab) <- paste("Bioc", rownames(tab))
colnames(tab) <- paste("Seurat", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)

Find Markers

The FindAllMarkers function allows us to identify markers across all clusters.

Nd1T.markers <- FindAllMarkers(Nd1T.obj.filt, only.pos = TRUE)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
head(Nd1T.markers[Nd1T.markers$cluster == 1, ])
##                 p_val avg_log2FC pct.1 pct.2     p_val_adj cluster    gene
## Pzp     1.715897e-267   5.646907 0.968 0.055 6.847289e-263       1     Pzp
## Habp2   3.308956e-228   4.367410 0.927 0.074 1.320439e-223       1   Habp2
## Crp     5.925219e-227   4.383579 0.943 0.079 2.364459e-222       1     Crp
## Adamts5 1.379875e-223   4.735019 0.785 0.027 5.506392e-219       1 Adamts5
## Cyp2j5  6.698187e-220   4.499152 0.822 0.040 2.672912e-215       1  Cyp2j5
## Ugt8a   9.152623e-191   4.181251 0.664 0.019 3.652354e-186       1   Ugt8a

Find Markers

And we can then visualise this using the VlnPlot function.

VlnPlot(Nd1T.obj.filt, features = "Pzp")